316 ◾ Bioinformatics
-2 fastq_pure/ERR1823587_pure_R2-50.fastq.gz \
--only-assembler \
--threads 4 \
--memory 16 \
--phred-offset 33 \
-k 51
mkdir metag_moderate
metaspades.py \
-o metag_moderate \
-1 fastq_pure/ERR1823601_pure_R1-50.fastq.gz \
-2 fastq_pure/ERR1823601_pure_R2-50.fastq.gz \
--only-assembler \
--threads 4 \
--memory 16 \
--phred-offset 33 \
-k 51
mkdir metag_severe
metaspades.py \
-o metag_severe \
-1 fastq_pure/ERR1823608_pure_R1-50.fastq.gz \
-2 fastq_pure/ERR1823608_pure_R2-50.fastq.gz \
--only-assembler \
--threads 4 \
--memory 16 \
--phred-offset 33 \
-k 51
Run “metaspades.py --help” to read about the usage and options of this program.
Several files are produced in the output directories: “metag_healthy”, “metag_moder-
ate”, and “metag_severe”. The files that contain the assembly sequences are the “contigs.
fasta” and the “scaffolds.fasta”. Contigs are made from read overlaps. The contigs are then
ordered, oriented, and connected with gaps filled with Ns to form the scaffolds. The K51
directory contains the individual result files for an assembly with 51-mers. However, when
multiple K directories are found, the best assembled sequences are the ones that are stored
outside these K directories. The directory “misc” contains broken scaffolds.
The file with the “.gfa” extension is in Graphic Fragment Assembly (GFA) file format in
which the sequences are represented by lines starting with “S” and the overlaps between
sequences are represented by lines starting with “L” as shown in Figure 8.3. The plus (+)
and minus (−) signs indicate whether the overlapping sequence is the original or its reverse
complement. The value in the form “XM” in a link indicates overlap length.
Thus, the file “assembly_graph_with_scaffolds.gfa” generated by metaSPAdes is the
GFA file that represents the final assembly of metagenomes in the sample. SPADes built
this assembly graph based on k-mers formed from the reads (vertices) and their overlaps
(edges). Then, the assembler resolves paths across the assembly vertices and outputs non-
branching paths as contigs.